--- category: projects ---
To match the plot of the MnDOT, I’ll use an area plot, so I use the same code but instead of geom_point(), I’ll use geom_area(). I’ll also need to plot two different sets of data on one plot: ‘2017-2019 Daily Average’ and ‘2020 Daily Total’. To do this, I’ll create a new view in SQL that calculates 2017-2019 daily averages in one column and includes the 2020 counts in a second new column. However, before I do that, the MnDOT report combined the ‘In’ and ‘Out’ directions for each day, so I’ll start by doing that.
Right now, the table looks like this:
counts_daily <- tbl(con, 'counts_daily')
head(counts_daily)
## # Source: lazy query [?? x 9]
## # Database: mysql [root@:/counts]
## bikeometer_id date direction count is_weekend year month day
## <int> <date> <chr> <int64> <int> <int> <int> <int>
## 1 10 2011-09-02 I 343 0 2011 9 2
## 2 10 2011-09-02 O 312 0 2011 9 2
## 3 10 2011-09-03 I 344 1 2011 9 3
## 4 10 2011-09-03 O 354 1 2011 9 3
## 5 10 2011-09-04 I 352 1 2011 9 4
## 6 10 2011-09-04 O 348 1 2011 9 4
## # ... with 1 more variable: month_day <chr>
In MySQL, I’ll create a new view in MySQL that takes the sum of both the ‘I’ and ‘O’ values and group them by ‘date’ and ‘bikeometer_id’.
For those not familiar with MySQL views, they look like tables but are actually recalculated each them they are accessed. This is what I want, because my python program imports the new Bike Arlington data into my ‘counts_daily’ MySQL table. Each time the ‘counts_daily_total’ view is accessed, it will automatically update to account for new data added to the ‘counts_daily’ table. Thus, I don’t need to change my python program to generate new and useful tables in MySQL.
Now, let’s take a look at our new view.
counts_daily_total <- tbl(con, 'counts_daily_total')
head(counts_daily_total)
## # Source: lazy query [?? x 5]
## # Database: mysql [root@:/counts]
## bikeometer_id date count is_weekend month_day
## <int> <date> <dbl> <int> <chr>
## 1 10 2011-09-02 655 0 09-02
## 2 10 2011-09-03 698 1 09-03
## 3 10 2011-09-04 700 1 09-04
## 4 10 2011-09-05 459 0 09-05
## 5 10 2011-09-06 211 0 09-06
## 6 10 2011-09-07 194 0 09-07
As you can see, the ‘I’ and ‘O’ directions are combined!
In a previous post, I chose a time-frame and which Bikeometers were best to plot. So I’ll filter by creating a new view in MySQL called ‘vw_filtered_counts_daily_total_2017_to_2019’.
filtered_counts_daily_total_2017_to_2019 <- tbl(con, 'filtered_counts_daily_total_2017_to_2019')
filtered_counts_daily_total_2017_to_2019
## # Source: table<filtered_counts_daily_total_2017_to_2019> [?? x 5]
## # Database: mysql [root@:/counts]
## bikeometer_id date count is_weekend month_day
## <int> <date> <dbl> <int> <chr>
## 1 18 2017-01-01 52 1 01-01
## 2 18 2017-01-02 14 0 01-02
## 3 18 2017-01-03 43 0 01-03
## 4 18 2017-01-04 116 0 01-04
## 5 18 2017-01-05 74 0 01-05
## 6 18 2017-01-06 51 0 01-06
## 7 18 2017-01-07 7 1 01-07
## 8 18 2017-01-08 15 1 01-08
## 9 18 2017-01-09 49 0 01-09
## 10 18 2017-01-10 63 0 01-10
## # ... with more rows
df <- tbl(con, 'the_best_table')
# These Bikeometers are all located in the Arlington region
In order for this table to be easily read by R, we need to covert it from a wide table to a long table. This is the resource I used to understand how to do the conversion.
df.long <- pivot_longer(df, cols=2:3, names_to='year', values_to='counts')
Here I added the year to the end of the all the month_day values then convert that to a date class so R has an easy time graphing it.
df.long <- mutate(df.long, month_day=paste(month_day, '-2020', sep = ''))
df.long <- collect(df.long)
Next, I converted each character string in the month_day column into a date class.
#```{r} df.long$month_day <- mutate(month_day = as.Date(month_day, format=‘%m/%d/%Y’))
df.long
#```
Then I’ll filter for the dates in my range
df.long.filtered <- df.long %>% filter(month_day >= '03-12-2020' & month_day <= '05-15-2020') %>% collect()
df.long.filtered
## # A tibble: 130 x 3
## month_day year counts
## <chr> <chr> <dbl>
## 1 03-12-2020 average_2017_to_2019 88.4
## 2 03-13-2020 average_2017_to_2019 101.
## 3 03-14-2020 average_2017_to_2019 93.7
## 4 03-15-2020 average_2017_to_2019 83.8
## 5 03-16-2020 average_2017_to_2019 63.9
## 6 03-17-2020 average_2017_to_2019 55.2
## 7 03-18-2020 average_2017_to_2019 97
## 8 03-19-2020 average_2017_to_2019 110.
## 9 03-20-2020 average_2017_to_2019 108.
## 10 03-21-2020 average_2017_to_2019 71.8
## # ... with 120 more rows
Finally I convert the month_day values from a ‘character class’ to a time class. Be sure to lowercase the ‘m’ and ‘d’ in ‘%m-%d-%Y’.
df.long.filtered$month_day <- strptime(df.long.filtered$month_day, '%m-%d-%Y')
class(df.long.filtered$month_day[1])
## [1] "POSIXlt" "POSIXt"
df.long.filtered$month_day <- as.Date(df.long.filtered$month_day)
class(df.long.filtered$month_day[1])
## [1] "Date"
#library(ggpmisc)
p <- df.long.filtered %>% ggplot(aes(x = month_day, y = counts)) +
geom_line(aes(group = year, color = year))
#scale_x_date()
#stat_peaks(span = 30)
print(p)
p <- df.long.filtered %>% ggplot(aes(x = month_day, y = counts)) +
geom_area(aes(group = year, color = year))
print(p)
It looks like there definitely was an increase but it doesn’t look as nice as the MnDOT plot. Let’s investigate this a little.
I’ll map the locations of the Bikeometers to see where exactly they are in Arlington. To accomplish this, I found this video by Professor Lisa Lendway very helpful along with her website.
On the left are the numbers we need to enter into the bbox variable below in order to get the graph from Stamen Maps. I’ve chosen ‘terrain’ as my map type and the ‘zoom’ will be 13, which is in the openstreetmap url just after the text ‘map=’.
library(ggmap)
## Warning: package 'ggmap' was built under R version 4.0.5
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.0.4
# Get the map information
arlington <- get_stamenmap(
bbox = c(left = -77.2377, bottom = 38.8075, right = -76.9744, top = 38.9080),
maptype = "terrain",
zoom = 12)
## Source : http://tile.stamen.com/terrain/12/1169/1566.png
## Source : http://tile.stamen.com/terrain/12/1170/1566.png
## Source : http://tile.stamen.com/terrain/12/1171/1566.png
## Source : http://tile.stamen.com/terrain/12/1172/1566.png
## Source : http://tile.stamen.com/terrain/12/1169/1567.png
## Source : http://tile.stamen.com/terrain/12/1170/1567.png
## Source : http://tile.stamen.com/terrain/12/1171/1567.png
## Source : http://tile.stamen.com/terrain/12/1172/1567.png
## Source : http://tile.stamen.com/terrain/12/1169/1568.png
## Source : http://tile.stamen.com/terrain/12/1170/1568.png
## Source : http://tile.stamen.com/terrain/12/1171/1568.png
## Source : http://tile.stamen.com/terrain/12/1172/1568.png
Next we will import the Bikeometer data from my database that contains the longitude and latitudes needed to plot the Bikeometers.
sql_cmd <- "SELECT * FROM counts.bikeometer_details WHERE bikeometer_id in (14,15,16,18,22,31,39)"
# creates a lazy table
bikeometer_table <- dbGetQuery(con, sql_cmd)
bikeometer_table
## bikeometer_id name latitude longitude region region_id
## 1 18 Crystal NB bike lane 38.857315 -77.049152 Arlington 1
## 2 14 Fairfax EB bike lane 38.882767 -77.104615 Arlington 1
## 3 22 Military NB bike lane 38.905509 -77.109349 Arlington 1
## 4 39 Potomac Yard Trail #1 38.829850 -77.047850 Alexandria 2
## 5 16 Quincy NB bike lane 38.885457 -77.108055 Arlington 1
## 6 15 Quincy SB bike lane 38.884877 -77.108075 Arlington 1
## 7 31 Roosevelt Bridge 38.893822 -77.065737 Arlington 1
One Bikeometer stands out as not even being in Alrington… how did that get in there? Let’s plot it anyway to see how close it is to Arlington. Realistically, I frequently bike into Alexandria on the weekends but I’ll probably end up excluding it none the less.
# Convert latitude and longitude to integers
bikeometer_table$longitude <- as.numeric(bikeometer_table$longitude)
bikeometer_table$latitude <- as.numeric(bikeometer_table$latitude)
# Plot the points on the map
ggmap(arlington) + # creates the map "background"
geom_point(data = bikeometer_table,
aes(x = longitude, y = latitude, color = region),
alpha = 0.5,
size = 3) +
theme_map()
From my experience, the most popular bike trail in Arlington will be on the Arlington Loop. I want to get a sense if these Bikeometers are on the Arlington Loop, and if not, choose Bikeometers that are on the loop. My hypothesis is that bike trails that weren’t popular before the pandemic won’t see much change after the pandemic. I generally either ride the Arlington Loop or the W&OD Trails as they are the most bike-friendly. A trail that is less bike-friendly before the pandemic won’t become that much more bike-friendly during a pandemic, even if there is a reduction of traffic in the city.
I’ll use the GPS coordinates from my Garmin GPS watch to create a layer for my plot that shows the Arlington Loop trail. I downloaded the gpx file from the Garmin Connect website.